Multiple Uses of Frequent Sets and Condensed Representations (Extended Abstract)

نویسندگان

  • Heikki Mannila
  • Hannu Toivonen
چکیده

In interactive data mining it is advantageous to have condensed representations of data that can be used to efficiently answer different queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with O/i vaiues and a threshoid 6, a frequent set of r is a set X of columns of r such that at least a fraction u of the rows of r have a 1 in all the columns of X. Finding frequent sets is a first step in finding association rules, and there exists several efficient algorithms for &ding the frequent sets. We show that frequent sets have wider applications than just finding association rules. We show that using the inclusion-exclusion principle one can obtain approximate confidences of arbitrary boolean rules. We derive bounds for the errors in the confidences, and show that information collected during the computation of frequent, sets can also be used to provide individual error bounds for each clause. Experiments show that this method enables one to obtain different forms -c-..I-_ c---3-L. -..I.---,-. r--l. T\.-rl-------~UI ~ue8 1r0m ua~a exu-errmy lab. rur~nermore, we define a general notion of condensed representations, and show that frequent sets, samples and the data cube can be viewed as instantations of this concept.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multiple uses of frequent sets and condensed representationsExtended

In interactive data mining it is advantageous to have condensed representations of data that can be used to eeciently answer diierent queries. In this paper we show how frequent sets can be used as a condensed representation for answering various types of queries. Given a table r with 0/1 values and a threshold , a frequent set of r is a set X of columns of r such that at least a fraction of th...

متن کامل

A Survey on Condensed Representations for Frequent Sets

Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task ...

متن کامل

Separating Structure from Interestingness

Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented. In this paper we propose a general approach to build condensed repr...

متن کامل

Numerical Abstract Domain using Support Functions (Extended Version)

An abstract interpretation based static analyzer depends on the choice of both an abstract domain and a methodology to compute fixpoints of monotonic functions. Abstract domains are almost always representations of convex sets that must provide efficient algorithms to perform both numerical and order-theoretic computations. In this paper, we present a new abstract domain that uses support funct...

متن کامل

Itemset Support Queries Using Frequent Itemsets and Their Condensed Representations

The purpose of this paper is two-fold: First, we give efficient algorithms for answering itemset support queries for collections of itemsets from various representations of the frequency information. As index structures we use itemset tries of transaction databases, frequent itemsets and their condensed representations. Second, we evaluate the usefulness of condensed representations of frequent...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996